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Abstract — Given a family of binary-input memoryless output- 
symmetric (BMS) channels having a fixed capacity, we derive the 
BMS channel having the highest (resp. lowest) capacity among all 
channels that are degraded (resp. upgraded) with respect to the 
whole family. We give an explicit characterization of this channel 
as well as an explicit formula for the capacity of this channel. 

I. Introduction 

Channel ordering plays an important role in analyzing the 
asymptotic behavior of iterative coding systems over binary- 
input memoryless output-symmetric (BMS) channels. We say 
that two BMS channels are ordered by a real-valued parameter, 
if the "worse" channel has a larger parameter than the "better" 
channel. E.g., we might consider as a real-valued parameter the 
bit-error probability, the capacity, or perhaps the Battacharyya 
parameter 

One particularly useful way of imposing an order is to 
consider the notion of degradation [1\. The main reason that 
this order is so useful is that many important functionals and 
kernels (defined later) associated with BMS channels either 
preserve or reserve the ordering induced by degradation. In 
addition, the order of degradation is preserved under many 
natural operations one might perform on channels. 

Instead of considering a single BMS channel, we are inter- 
ested in a family of BMS channels sharing certain properties. 
The cardinality of the family can be either finite or infinite. A 
particularly important example for our purpose is the family of 
BMS channels having fixed capacity c. This family is denoted 
by {BMS(c)}. The question we address is the following. Given 
the family of all BMS channels of capacity c, can we find the 
"best" degraded channel and the "worst" upgraded channel 
with respect to this family. I.e., can we find the channel that 
is degraded with respect to all members in the family and has 
the highest capacity, as well as the channel that is upgraded 
with respect to all members of the family and has the lowest 
capacity. 

Determining these channels is both natural and fundamen- 
tal. To mention just one possible application, assume that we 
want to construct a polar code [2 1 which works for all channels 
of a particular capacity. If we can find a channel which is 
degraded with respect to all elements in the family then a 
polar code designed for this channel will work a fortiori for 
all elements of the original family. We then say that this polar 
code is universal. This simple bound is somewhat too crude 
to construct a good universal polar code. But we can apply 
the same argument not to the original family but after a few 
steps of the polarization process. The more steps we perform 



the better the overall code will be (i.e., it will have higher 
capacity) and the closer we will get to an optimal construction. 
In what follows we will not be concerned with applications 
but we will only be interested in describing these two extreme 
channels given a channel family. As we will see, we will be 
able to give an explicit answer to the question. 

A. Channel Model 

A binary-input memoryless output-symmetric (BMS) chan- 
nel has an input X taking values over an input alphabet X, 
an output Y taking values over an output alphabet y, and a 
transition probability PY\x{y\x) for each x ^ X and y d y. 
Throughout the paper we take the input alphabet A" to be {±1} 
and the output alphabet 3^ to be a subset of R= [— oo, +oo]. 

Given a BMS channel a, an alternative description of the 
channel is the L-distribution of the random variable 

^ ' PY\x{Y\-l) 

A closely related random variable is 

D = d[Y) A tanh {^-^\ = \ 



-l(Y) 



-l(Y)' 



whose distribution is called a /^-distribution, conditioned on 
X = 1. Moreover, denote by \D\ the absolute value of the 
random variable D. Then, the distribution of \D\ is termed 
a I _D I -distribution and the associated density is called a |D|- 
density. Given a BMS channel a, we denote by |2l|and \a\ the 
associated |Z?| -distribution and |Z3|-density, respectively. The 
1 1? I -density of a discrete BMS channels is of the form 



\a\{x) = ^ai6{x~ a;^), 



(1) 



where < a^ < 1, J27^i ^i — 1' ^^^ < xi < X2 < • • • < 
Xn < 1; while 5{-) is the (Dirac) delta function and n E N+ 
is finite or countably infinite. 

B. Functionals of Densities 

The capacity of a BMS channel can be defined as a linear 
functional of its |Z?| -density [IJ. Formally, the capacity C(|a|) 
in bits per channel use of a BMS channel with D-density |a| 
is defined as 



C(|a|) 



|a|(x)(l-h(a;))da;, 



where h(a;) = /i2((l - x)/2), for x e [0,1], and h2{x) = 
— a;log2(a;) — {\ — x) log2(l — a;) is the binary entropy function. 
The entropy functional is defined as H(|a|) = 1 — C(|a|), and 
the entropy of a discrete BMS channel is X]"=i cti^i^i)- 

Another important functional is the Battacharyya parame- 
ter associated with the |D| -density \a\. It is defined as 

<B(|a|)= / \a\{x)^/l-x^<lx. 



If a BMS channel has a |D| -density of the form in (fill, the 
Battacharyya parameter is given by Yll=i ai-\/l — xf. 

C. Degradedness 

Denote by PY\x{y\x) ^nd Pz\x{^\x) the transition prob- 
abilities of two BMS channels a and a', respectively. Let X 
be the common input alphabet, and denote by y and Z the 
output alphabets, respectively. We say that a' is (stochastically) 
degraded with respect to a if there exists a memoryless channel 
with transition probability Pz|y(z|y) such that 



Pz\x{z\x) 



^Pz\Y{z\y)PY\x{y\3 
vey 



for all a; e A" and all z G Z. Conversely, the channel a is said 
to be (stochastically) upgraded with respect to the channel a'. 
The following lemma gives an equivalent characterization of 
degradedness and upgradedness |[T1. 

Lemma 1. Consider two BMS channels a and a' with the 
corresponding 1 13 [-distributions |2l| and |2l'|, respectively. The 
following statements are equivalent: 

(i) a' is degraded with respect to a; 

(ii) J^f{x)dm{x) < J^ f{x)d\Ql'\{x), for all / that are 
non-increasing and convex-n on [0,1]; 

(iii) J^ |2l|(a;)dx < J^ |2l'|(a;)dx, for all z e [0, 1]. 

II. Least Degraded Channel 

Now recall that the entropy functional H(|a|) associated 
with the |_D|-density |a| has /i2((l — x)/2) as its kernel in the 
1 13 1 -domain. This kernel is non-increasing and convex-n on 
[0,1], and therefore condition (ii) in Lemma [T] shows that a 
degraded channel has a larger entropy, and thus lower capacity, 
than the original channel. Similarly, a degraded channel has a 
larger probability of error and larger Battacharyya parameter 
than the original channel. 

Property 2. Note that there are many equivalent definitions 
for the least degraded channel. One particularly insightful 
characterization is that the least degraded channel is the unique 
channel which is degraded with respect to all elements of the 
family {BMS(c)} and is upgraded with respect to any other 
such channel. Therefore it has the highest capacity, lowest error 
probability and lowest Battacharyya parameter. 

The integral of |2l|(x) from z to 1, as suggested by Lemma 
IT] is important for characterizing degradedness among BMS 
channels. Therefore, for a BMS channel a, define 



Note that Aa{z) is decreasing on [0,1], and since |2l|(a;) is 
increasing in x, it follows that Aa{z) is a convex-n function 
of z. Moreover, if the BMS channel a is discrete, the function 
Aa{z) has a simpler form given by 



Aa{z) = y^a^ (1 - max {z, a; J) 



(3) 



Condition (iii) of Lemma [T] shows that, in order to find 
the least degraded channel with respect to {BMS(c)}, we take 
the maximum value of Aa(z) among all BMS channels a G 
{BMS(c)}, for each fixed z e [0, 1]. E.g., define 



A{z) = max{Aa{z) : a e {BMS(c)}} , 



(4) 



for every z e [0, 1]. Taking the convex-n envelope of A{z) 
then characterizes the desired channel. The following theorem 
characterizes A(z) exactly. 

Theorem 3. Consider the family of BMS channels {BMS(c)} 
of capacity c, < c < 1. Then, 

( {l-c){l-z) 
A{z) = I h(z) 

[l z if ze [l-2ebsc,l], 



if z e [0, 1 - 2ebsc) 



where ebsc G (Oi i) is the solution of 1 — /i2(ebsc) — c. 

Proof: First, it is clear that X]"=i '^i(l ~ Taax{z,Xi}) < 
2"=! ^i(^ ^ z) ^ 1 — z, and for z £ [1 — 2ebsc, 1] this value, 
1 — z, is achieved if the underlying BMS channel is the BSC 
with crossover probability equal to ebsc- 

Now for any fixed z e [0, 1 — 2ebso); assume that A(z) 
is achieved by the BMS channel d, e.g., A(z) = Arf(z). We 
claim that d does not have any probability mass in the interval 
(z, 1). We show this by contradiction. Suppose that there exists 
a probability mass ao at xq G (z, 1). Define p = h(a;o)/h(z). 
It is clear that p £ (0, 1), since the function h(-) is decreasing 
and convex-n on [0, 1]. The definition of p gives 



aoh(a;o) = aoph{z) + ao{l - p)h(l), 



(5) 



A,(z) ^ / |2l|(x)dx. 



(2) 



which means that, instead of putting the single probability 
mass ao at xq, we can split ao into two masses, aop and 
ao(l — p). and put these two masses at z and 1, respectively, 
without changing the entropy and thus the capacity. Canceling 
ao on both sides of (J5]l and using the fact that h(-) is decreasing 
and convex-n on [0, 1], we have 

h(a;o) < h{pz + l- p), 

which is equivalent to 1 — xq < p(l — z). Now notice that, since 
z < a;o < 1, the term corresponding to xq that contributes to 

Ad{z) is equal to ao(l — xq). But 

ao(l - xo) < aop(l - z) = aap{l - z) + ao(l - p){l - 1), 

which means that the value of Ad{z) can be increased by the 
splitting operation mentioned above. This, however, contradicts 
the assumption that Ad{z) is the maximum value at z. If there 
exists some probability masses on the interval [0, z), we can 
add these masses to the probability mass at z and delete the 
original masses, without changing Ad{z) and without violating 
the entropy (or capacity) constraint. Thus, the channel d has 



probability masses at points z and 1 only, and the probability 
mass at z is equal to (1 — c)/h(z), completing the proof. ■ 

Recall that the function A associated to a BMS channel 
is convex-n on [0, 1]. Thus, taking the convex-n envelope of 
A(z) gives the A function, call it A*(z), of the least degraded 
channel with respect to the whole channel family {BMS(c)}. 

Corollary 4. The least degraded channel is characterized by 

1 — c — 2ebsc 



A*(z)=<(^ ^ l-2ebsc 

1-z 



-z if z e [0,l-2ebsc), 
ifze [l-2ebsc,l]. 



Once having this characterization, we can derive the exact 
formula of the capacity of the least degraded channel. 

Theorem 5. Given a family of BMS channels of capacity c, the 
capacity in bits per channel use of the least degraded channel 
is given by 



C* 



1 — 2ebsc 



(6) 



Proof: Recall that the entropy can be expressed in the 
following alternative form. 



H(|a|) 



1 



|2l|(a;) dz 



(7) 



/o In2(l-z2) 

Inserting the formula for A*(z) into (JTJ) and integrating over 
z gives the desired result. ■ 

III. Least Upgraded Channel 

Property 6. The least upgraded channel is the unique channel 
which is upgraded with respect to to all elements of the family 
{BMS(c)} and is degraded with respect to any other such 
channel. Therefore this channel has the lowest capacity, highest 
error probability and highest Battacharyya parameter. 

In order to specify the least upgraded channel, we first 
notice the following lemma. 

Lemma 7. For any BMS channel associated with the |Z3|- 
density of the form in (fill and having capacity c, < c < 1, 
we have 1 — 2ebsc < a^?i < 1- 

Proof: Assume on the contrary that Xn < 1 — 2ebsc- Then, 
the monotonicity property of h(-) gives 

h(x„) > h(l-2ebsc) = 1-c, 

which is equivalent to X^ILi '^«'^(^j) > 1 ^ c, contradicting 
the assumption that the entropy is 1 — c. ■ 

Now, for each fixed z e [0, 1], we take the minimum value 
of Aq(z) among all BMS channels a e {BMS(c)}, e.g., define 



A(z) = min{A„(z) : a G {BMS(c)}} 



(8) 



Suppose that, for any fixed z e [0, 1], the minimum value is 
achieved by the channel u, i.e., A(2;) — A„(z). For channel 
u, the number of probability masses on the interval [z, 1] is 
characterized by the following lemma. 

Lemma 8. The channel u, which achieves the minimum value 
A(z) at z, has at most one probability mass on [z, 1]. 



Proof: Assume there are two probability masses a_ and 
a+ at X- € [z, 1] and x+ G [z, 1], respectively. Then, there 
exists a point x on \x-,Xj^\ such that 



a_h(a;_) + a+h(a;_|_) = (a_ + a+)h(a;). 



(9) 



Dividing both sides of (J9|l by (q:_ +«+) and using the convex- 
n property of h(-) gives 



h(£) < h 



^+ ' 



which means that a_a;_ + a+x^ < (a_ + a+)x. Since the 
terms corresponding to x^ and xj^ that contribute to A„(z) is 
equal to a_(l — x^) + a+(l — a;+), we have 



Q!_(l — a;_) + a+(l — a;+) 



Ua 



— (a_ + a+) 1 x^ - 

> (a_ + a+)(l - x), 

which shows that by combining two probability masses a_ and 
«+ into a single probability mass (a_ + a+) at position ;r, 
the value of Ak(z) is decreased, contradicting the assumption 
that A„(2:) is minimal at point z. ■ 

Lemma|2]and Lemma[8]show that the channel u has at most 
two probability masses on the interval [0, 1]. If it has only one 
probability mass, it is in fact a BSC. Otherwise, in general, 
there are two probability masses, call them 7 and 1 — 7, on 
the intervals [0, z] and [1 — 2ebsc7l]' respectively. Denote by 
X and X the positions of these two masses, respectively. 

Lemma 9. Either 4 = holds or x = 1 holds, or x = and 
X = \ hold simultaneously. 

Proof: Suppose on the contrary that 2; 7^ and x ^ \. 
Then, consider decreasing x by C,, where C, E M+ is sufficiently 
small. Assume that x is increased by S correspondingly, where 
S E M+ is sufficiently small. First-order Taylor expansion gives 

h(i-C) = h(x)-Ch'(i) + 0(C'), 
h(£ + S) = h{x) + Sh'{x) + 0(6^). 
Eliminating second and higher order terms results in 

h{x) = h{x-0 + C^'{x), 
h{x) = h{x + S)-Sh'{x). 

Multiplying the above two equations by 7 and 1 — 7, respec- 
tively, and rearranging terms give 

7h(x) = 7 (Hi - C) + #^h(x - C) 



h(x-C) 



(l-7)h(x) = (l-7) (h{x + 5)- 



Sh\x) 



h{x + 6) 



h{x + S) 

Now, for any sufficiently small ( > 0, one can pick 6 > 
small enough, such that 

7Ch'(i) (1 - 7)'5h'(x) ^ 



h(x-C) h{x + d) '^°' 

and 7o > 0, and we then have 

7h(i) = (7-7o)h(x-C), 
(1 - 7)h(2;) = (1 - 7 + 7o)h(x + (5), 



(10) 



which means that by deleting 70 from 7 at position x ~ C, and 
adding 70 to 1 — 7 at position x + S, one can keep the entropy 
constraint satisfied. Now, denote by A„' (z) the result after the 
above operations. Then, we can obtain 

Au'{z)-Au{z) 
= (7 - 7o)(l - z) + (1 - 7 + 7o)(l -x-S) 

-7(l-^)-(l-7)(l-i) 
= 7o(z - £) - (1 - 7 + 70)^ < 0, 

which shows that A„(z) is not minimal at z, contradicting the 
assumption that the channel u achieves the minimum value 

A(z) at z. m 

Lemma [9] shows that, for the channel u achieving A(z) 
at z, its probability masses and in particular their associated 
positions cannot be arbitrary. Indeed, only the following three 
cases are possible. 

(i) There is only one probability mass on the interval [0, 1]. 
Then the channel u is in fact a BSC such that there is a 
probability mass 1 at position 1 — 2ebsc^ ^nd Absc(z) — 
2ebsc if 2 e [0,1 - 2ebsc), while Ab^dz) = 1 - z if 
z € [1 - 2ebsc, !]■ 

(ii) There are two probability masses 7 and 1 — 7 at positions 
X and X, respectively. Particularly, a; = 0, and x E [1 — 
2ebsc) 1) and a; > z. In this case, we have A„(z) = 
7(1 — z) + (1 — 7)(1 — x), and the entropy constraint 
is 7 + (1 — 7)h(a:) ~ 1 ~ c. Notice that x and 7 are 
parameters that should be optimized. Denote by Aopt(z) 
the corresponding optimal value. 

(iii) There are two probability masses 1 — c and c at positions 
X = Q and x = 1; namely, the channel m is a BEC, and 
Abec = (1 - c)(l - z) for z S [0, 1]. 

In order to find the minimum value A(z) at each point 
z € [0, 1], now it suffices to compare the point-wise results 
of the above three cases. Case (ii) above is a non-trivial case, 
since x and 7 are unknown parameters. However, the next 
lemma characterizes the optimal solutions of these parameters. 

Lemma 10. The optimization problem 

minimize 7(1 — z) + (1 — 7)(1 — x), 

7,s (11) 

subject to 7 + (1 — 7)h(.T) = 1 — c. 

has the optimal solutions 7(z) = (1 — c — h(x(z)))/(l — 
h(x(z))) and x(z) satisfying the fixed-point equation x{z) = 
(l-a;(z))ST _ 1. 

Proof: See Appendix IA] for a complete proof. ■ 

The next theorem characterizes the exact formula of A(z). 

Theorem 11. Consider the channel family {BMS(c)}. Then, 

if < z < Zbsc, 



2ebsc 
1-c 



h(a;(z)) 



A(z) = 



1 



U{xiz)) 
c 



l-h(a;(z)) 



(1-z) 

(1 - x(z)) if Zbsc < z <l, 



where x{z) is the solution of a; = (1 — a;) ^+1 

log2 (4ebscebsc) 



1 and 



^bsc 



log2(ebsc/ebsc)' 



(12) 



Proof: First, the equation x{z) = (1 — a;(z))^+i — 1 for 
z G [0, 1) is equivalent to 



z(a;) 



log2(l-.T)+log2(l + a;) 
log2(l-a;) -log2(l + a;)' 



(13) 



where \m\x^iz{x) = 1. Plugging x = 1 — 2ebsc into (13i 
yields ( 12i. It is then clear that, for z e [0, Zbsc), 



A(z) = min{Absc(z), Abec(z)} = 2ebsc- 

Note that, for z = Zbsc, we have Absc(2) = Aopt(z). For 
z e [zbsc, 1)> the monotonically non-increasing property of 
the A function shows that Aopt(z) < Absc{z)- Moreover, when 
x{z) — >■ 1 as z ^ 1, the continuity of h(-) shows that, the 
negative of the slope of Aopt(z) is equal to 



lim 



1 — c— h(x(z)) 
1-hW^)) 



= l-c, 



which is equal to the negative of the slope of Abec (2). Since 
the slope of Aopt(z) at Zbsc is and Abec(^) has a constant 
slope, it follows from the convex-n property of the A function 
that Aopt(z) < Abec(-z), for z E [zbscil)- Consequently, when 

z e [zbsc, 1), we have 



A(z) - 7(z)(l - z) + (1 - 7(z)) (1 - x{z)) 



1 — c — h(a;(z)) 



l-h(x(z)) 
Trivially, when z = 1, A(z) 



(1-^) + 



{l-x{z)). 



1 -h(a;(z)) 
0, completing the proof 



Now denote by A* (z) the A function of the least upgraded 
channel with respect to the whole channel family {BMS(c)}. 
Then, it is not difficult to see the following. 

CoroUary 12. A,(z) = A(z), for z e [0, 1]. 

Denote by C* the capacity of the least upgraded channel. 
Expressing z in terms of x, inserting the formula for A*(z(a;)) 
into (J7]l, and integrating over x gives C*. However, there is no 
simple formula in the closed form for C* . Instead, we compute 
it numerically in Section IV 



if z = 1, 



IV. Simulation Results 

In this section, we provide several simulation results re- 
garding channel degradation and upgradation. From Fig. [T] 
one can see that the difference between A(z) and A*(z) is on 
the interval [0, 1 — 2ebsc)- On this region, A(z) is convex-U 
while A* (z) is linear and thus convex-n. 

Theorem [3] and Theorem [TT| suggest that channels with 
very few probability masses always achieve the point-wise 
maximum or point-wise minimum. Thus, we randomly gener- 
ate 5,000 BMS channels of capacity 0.5 having 2 probability 
masses, and 5,000 BMS channels of capacity 0.5 having 3 
probability masses. Fig. l2] depicts the simulation results. 

Other interesting quantities are the gap between c and the 
capacity of the least degraded channel, call it #^p, and the gap 





Fig. 1: Left: The point-wise maximum A.{z) (daslied line) and the point-wise 
minimum K{z) (solid line) of Ka{z), for all a e {BMS(0.5)}. Right: The 
A functions of the least degraded channel A* (z) (dashed line) and the least 
upgraded channel A* (z) (solid line), respectively. 
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Fig. 2: The A functions of randomly generated BMS channels (grey lines) from 
{BMS(0.5)}, and the point-wise maximum A(2) (dashed line) and minimum 
A(2) (solid line) of ka{z), for all a 6 {BMS{0.5)}. 



dS-P 



0.1 


0.0728 


0.0686 


0.2 


0.1222 


0.1009 


0.3 


0.1552 


0.1185 


0.4 


0.1739 


0.1257 


0.5 


0.1795 


0.1243 


0.6 


0.1721 


0.1155 


0.7 


0.1516 


0.0995 


0.8 


0.1175 


0.0762 


0.9 


0.0684 


0.0446 



TABLE I: The gap between c and C* 
C* and c is n8^P = C. - c. 



is (i""P 



C*, and the gap between 



between the capacity of the least upgraded channel and c, call 
it uS'P. See Table U for details. 

Fig. [3] depicts the capacity C* (C*, respectively) of the 
least degraded channel (resp. the least upgraded channel) as a 
function of c, where c ranges from 0.001 to 0.999 with a step 
size of 0.001. We then compute the maximum value of d^^P 
being approximately 0.1795 which corresponds to c = 0.4940; 
while the maximum value of u°^p is approximately 0.1261 
which corresponds to c = 0.4310. 



Fig. 3: The channel capacity of the least degraded channel (dashed line) as 
a function of the underlying channel capacity c, < c < 1; the channel 
capacity of the least upgraded channel (dotted line) as a function of the 
underlying channel capacity c, and the underlying channel capacity (solid 
line) as a function of itself. 



channels of fixed capacity c and performing one polarization 
step to them. We reserve such questions for future work. 
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Appendix 

A. Proof of Lemma \10\ 

Proof. Define /(7, £,77) = 7(1 — z) + (1 — 7)(1 — 
2^) ~ '? (7 + (1 ~ 7)h(a;) — 1 + c). Then, taking the first-order 
derivatives with respect to x, 7, and 77 yields 



d 

dx 
d 

97 
d 
drj 



fi.li x,r])^j~l-T]{l~ 7)h'(x), 

fh, X,T]) ^ 1 ~ Z - {1 ~ X) ~ T] + 77h(x), 



/(7, £, fy) = 7 + (1 - l)Hx) - 1 + c. 



(14) 
(15) 
(16) 

l-U{x) ■' l-h(£)' ^^'^^ 

Plugging ( [T7| into ( [T4| and making some manipulations give 

(1 + Z) l0g2(l +x) + il-z) log2(l ~X)=0, 

which is equivalent to 

s = (1 - x)^ - 1. 



Now setting the first-order derivatives to be gives 

1 — c — h(a;) X — z 

7 = — ; ^77^^^ ^"d V = 



V. Conclusion and Open Problems 

The least degraded channel and the least upgraded channel 
with respect to the family of BMS channels of fixed capacity 
were introduced in this paper. Also, their characterizations and 
capacity formulae were derived. An interesting open question 
is to consider other families of channels. E.g., consider the 
family of channels which are the result of taken the family of 
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